SMCV: a Methodology for Detecting Transient Faults in Multicore Clusters
نویسندگان
چکیده
The challenge of improving the performance of current processors is achieved by increasing the integration scale. This carries a growing vulnerability to transient faults, which increase their impact on multicore clusters running large scientific parallel applications. The requirement for enhancing the reliability of these systems, coupled with the high cost of rerunning the application from the beginning, create the motivation for having specific software strategies for the target systems. This paper introduces SMCV, which is a fully distributed technique that provides fault detection for message-passing parallel applications, by validating the contents of the messages to be sent, preventing the transmission of errors to other processes and leveraging the intrinsic hardware redundancy of the multicore. SMCV achieves a wide robustness against transient faults with a reduced overhead, and accomplishes a trade-off between moderate detection latency and low additional workload.
منابع مشابه
transformer differential protection using the fault-generated high-frequency transient components
Power transformers are the most important components of a power system, so their protection is a critical issue. This paper proposes a novel and efficient algorithm based on the high-frequency components of the differential current signal to discriminate between the magnetizing inrush currents and the internal faults. After detecting the over-current in the differential current signals, samples...
متن کاملA Clustering Approach to Scientific Workflow Scheduling on the Cloud with Deadline and Cost Constraints
One of the main features of High Throughput Computing systems is the availability of high power processing resources. Cloud Computing systems can offer these features through concepts like Pay-Per-Use and Quality of Service (QoS) over the Internet. Many applications in Cloud computing are represented by workflows. Quality of Service is one of the most important challenges in the context of sche...
متن کاملCross Entropy-Based High-Impedance Fault Detection Algorithm for Distribution Networks
The low fault current of high-impedance faults (HIFs) is one of the main challenges for the protection of distribution networks. The inability of conventional overcurrent relays in detecting these faults results in electric arc continuity that it causes the fire hazard and electric shock and poses a serious threat to human life and network equipment. This paper presents an HIF detection algori...
متن کاملA New Method for Duplicate Detection Using Hierarchical Clustering of Records
Accuracy and validity of data are prerequisites of appropriate operations of any software system. Always there is possibility of occurring errors in data due to human and system faults. One of these errors is existence of duplicate records in data sources. Duplicate records refer to the same real world entity. There must be one of them in a data source, but for some reasons like aggregation of ...
متن کاملDetection of power oscillation and simultaneous faults using Clark transform
Distance relays are widely used to protect transmission lines. Sometimes, in these lines due to the occurrence of the oscillation of the power, the impedance calculated in the distance relay enters into its functional zones and leads to the cutting off of the lines. This issue can cause global power outages. Accordingly, in this paper, a Clark-based method for detecting the oscillation of power...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CLEI Electron. J.
دوره 15 شماره
صفحات -
تاریخ انتشار 2012